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ABSTRACT 

We present the two-point correlation function (2PCF) of narrow-line active galactic nuclei (AGN) 
selected within the First Data Release of the Sloan Digital Sky Survey. Using a sample of 13605 AGN 
in the redshift range 0.055 < z < 0.2, we find that the AGN auto-correlation function is consistent 
with the observed galaxy auto-correlation function on scales 0.2/i~^Mpc to > 100/i~^Mpc. The AGN 
hosts trace an intermediate population of galaxies and are not detected in either the bluest (youngest) 
disk-dominated galaxies or many of the reddest (oldest) galaxies. We show that the AGN 2PCF is 
dependent on the luminosity of the narrow [OIII] emission line (i[o///])i with low L^qiii] AGN having 
a higher clustering amplitude than high L^qiii] AGN. This is consistent with lower activity AGN 
residing in more massive galaxies than higher activity AGN, and L^qiii] providing a good indicator 
of the fueling rate. Using a model relating halo mass to black hole mass in cosmological simulations, 
we show that AGN hosted by ~ 10^^ Mq dark matter halos have a 2PCF that matches that of the 
observed sample. This mass scale implies a mean black hole mass for the sample of Mbh ^ 10^ Mq. 
Subject headings: galaxies: active — galaxies: formation — galaxies: statistics 
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1. INTRODUCTION 

The clustering of galaxies as a function of their 
properties provides important constraints on models of 
galaxy formation and evolution. Such clustering is of- 
ten measured usi ng the two point auto-correlation func- 
tion (2PCF; see iPeebles 1980) . In hierarchical models 
of structure formation, the amplitude of the 2PCF de- 
pends upon the mass of the dark matter halo s (i.e., 
more massive halos are clustered more strongly; iKaiseiJ 
Il98 (?). while the shape of the 2PCF can depend upon 
the details of how galaxies reside in those dark matter 
halos 



IjZehavi et al.M2004l) . For example, the amplitude 
and slope of the 2PCF is lower for blue galaxies than 
for ga laxies with the reddest colors {e.g., Davis & Gellcr 
rm7fillZehaviet^f200?). 

In this letter, we continue our study of the relation be- 
tween the environ ment of galaxies i n the Sloan Digital 
Sky Survey (SDSS; lYork et a,l.ll200?il) and their observed 
physical properties (see iGomez et all 120031: iMiller et al.l 
120031 iBalogh et alJ l2004f) . In particular, we present 
the redshift-space 2PCF for a subset of SDSS galaxies 
spectroscopically classified as narro w- line active galac- 
tic nuclei fAGN: iMiTler et all 1200^ . Our analysis has 
two advantages over previous measurements of the AGN 
2PCF: larger sample size (in number and area), and a 
homogeneous select ion criteria (compare to Table 1 of 
iBrown et all l)2001f) for previous AGN 2PCF measure- 
ments). In addition, the data are now large enough to 
study both volume-limited subsamples as well as how 
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AGN or AGN host galaxy properties affect the 2PCF. 

2. DATA 

We use the main galaxy sample data IjStrauss et alJ 
120021) from the First Dat a Release (DRl) of the SDSS 
fsee lAbazaiian et al.l2003j) . To select AGN, we have used 
the rn ethodology presented in Section 2.1 of lMiller et alJ 
()2003j) . where the AGN are classified using the emission- 
line flux ra tios log ( [OIII] /H/3) versus log([NII]/Ha) (see 
iKewlev et al. 2001) or simply log([NII1/H a)> -0.2, if 
[OIII] or H/3 are not measur ed (see also iCarter et alJ 
1200 It iBrinchmann et al.ll2004|) . We remove all galaxies 
from areas with high seeing values (> 2 ") and r band 
Galactic extinction > 0.4 magnitudes. These restrictions 
produce a sample of 72455 SDSS DRl galaxies within 
0.055 < z < 0.2, from which we classify 13605 galaxies 
as AGN. This fraction of AGN (18%) is consisten t with 
the fin dings of lMiller et a l. ( 2003) and Brinchmann et all 
(|2004|) . We discuss implications of our classifications in 
Section im 

3. AGN AND GALAXY CORRELATION FUNCTIONS 

We account for the survey geometry (or mask) by con- 
structing random catalogs that match both the survey 
angular and radial selection functions. We first con- 
struct a random catalog that has the same angular mask 
as the real data. We then construct the radial selec- 
tion function by smoothing the observed redshift dis- 
tributions with a Gaussian of width z = 0.025. These 
smoothed redshift distributions are used to randomly as- 
sign redshifts to the data points in our random cata- 
logs, which are ten times larger than the real datasets. 
We note that our conclusions are robust to the choice 
of width for the smoothing kernel. W e calculate the 
2PCF using the iLandv fc Szalavl l| 19931) estimator and 
estimate the covariance using the jack-knife resampling 
technique (e.g. Luoton 1993; Zchavi ct al. 2002) split- 
ting the angular mask of our data into 32 subsections of 
~ 10° X 5° (or ~ 60 X 30 h'^ Mpc at z = 0.1). We use 
Ho = lOOkms-^Mpc-i, n^ = 0.3, and ^a = 0.7. 
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Fig. 1.— (Top) The 2PCF for AGN and all SDSS galaxies. 
We also show the 2PCF of Zehavi et al. (2002). The solid line is 
the model from Section 14.21 (Bottom) The ratio of the AGN and 
galaxy 2PCFs. 



In Figure^ we show the redshift-space 2PCF for both 
the AGN (stars) and all galaxies (filled dots) samples 
discussed in Se ction |21 We al s o show the SDSS redshift- 
space 2PCF of iZehavi et all i2002tl. As expected, our 
galaxy 2PCF agrees with IZehavi et a l. (2002), except on 
small scales due to incompleteness from fiber collisions 
JBla nton et al. 2003a), since Zehavi et al. attempt to 
correct for these collisions. We note that since the fiber 
collisions are uniformly distributed over the survey (see 
Blanton et al. 2003a), they affect both the galaxy and 
AGN 2PCF in the same way. Thus we study only rel- 
ative differences between 2PCFs here. We measure a 
simple x^ statistic between the two samples and account 
for errors on both 2PCFs by combining their individ- 
ual covariances. Specifically, we take the square root of 
the sum of the squared covariances, which accounts for 
correlations between data points of different separations 
in both datasets. We find y^ = 13 with 9 degrees-of- 
freedom for pair separations greater than ^ 1/i^^Mpc, 
i.e., there is no significant difference between the AGN 
and galaxy 2PCF. We show the ratio of the two 2PCFs 
in the bottom of Figure ^ The weighted mean ratio is 

Cagn/esai= 0.974 ±0.026. 

We show in Figure |21 the ratio of the AGN-galaxy 
cross-correlation f unction (see also iCroft et al.l Il999t 
IGroom et al.ll2003j) to the normal galaxy 2PCF again for 
the samples defined in Section |21 The weighted mean 
ratio between these two functions is S,agn~gai / £,gai-gai = 
0.922 ± 0.028. W ithin the one sigma uncertainties, this 
is consistent with iCroom et aLl pOOS) who demonstrate 
that, for 2; < 0.3 quasars, the ratio of the quasar- 
galaxy cross-correlation function is ^qso-gai/^gai-gai = 
0.97 ±0.05. 

Finally, we measure the 2PCF as a function of AGN 
activity as measured by the luminosity of the forbidden 
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Fig. 2. — The ratio of the AGN— galaxy cross-correlation function 
to the galaxy— galaxy auto— correlation function. 



[OIII] narrow emission line (i[o///])- This line is pro- 
posed to be only weakly affected by any residual star- 
formation (see Kauffmann et al. 2003). However, the ex- 
act connection between the [OIII] strength and AGN ac- 
tivity is not well estab lished in narrow-lin e systems (see 
| Nelsnnll2000t lBorosonlll) 02: Mathur 2000: iMathur et alJ 
120011: iGrupe fc MathuTl200# . We created two sub- 
samples of AGN, one containing the highest third of the 
L[oiii\ distribution (> 4.84 xlO*"* ergs~^) and one with 
the lowest third (< 1.29 xlO*"* ergs""'^). In order to min- 
imize any selection bias {e.g. Malmquist bias), we con- 
struct a pseudo- volume limited sample by restricting the 
AGN sample to a redshift range of 0.06 < z < 0.085 and 
to a k-corrected llBlanton et alJl2003bl) absolute mag- 
nitude limit of M,. < -19.8 (see iGomez et all 120031: 
iBalogh etUI 12004). This provides a sample of 2457 
AGN. We find that the distributions of host galaxy ab- 
solute magnitudes and redshifts are identical for the low 
and high L\^oiii] samples and to the entire galaxy sample. 
In Figure|3| we present the AGN 2PCF as a function of 
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We see a noticeable difference in the amplitude 



of clustering, with the lower luminosity AGN having a 
stronger clustering amplitude. We calculate the y^ dif- 
ference between the high and low -Lro///] sub-samples 
and find x^ = 40 with 5 degrees-of-freedom. Therefore, 
the 2PCFs for the high and low L^qiii] sub-samples are 
different at the > 5a level. We also see a similar dif- 
ference in the clustering amplitudes if we split the AGN 
sample as a function of the width of the [OIII] emission 
line. 

4. DISCUSSION 
4.1. The Properties of AGN Host Galaxies 

Throughout this paper we assume that the mea- 
sured galaxy properties reflect those of the host galax- 
ies and are not significantly affected by the AGN 
light. Bo th iSc hmitt. Storchi-B ergmann. fc FernandesI 
l|1999t) and lKauffmann et all l|2003^ find that the AGN 
contribution to the total luminosity in narrow-line AGN 
like those studied here rarely exceeds 5%. We measure 
the AGN contribution using the ratio of the total flux 
over the flux within the 3" fiber and find that the AGN 
contribute on average < 6% of the total light. 

Using our magnitude- and volume-limited sample, we 
find that the distributions of the AGN host galaxy 
properties are different from those of all galaxies (Fig- 
ure ^. For example, the concentration index (C) 
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Fig. 3.— The AGN 2PCF as a function of L[oiii]- We show 
the 2PCF for AGN in the top and bottom third of the i[o///] 
distribution. The solid line is for all SDSS galaxies taken from 
Figure [n 
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Fig. 4. — The distributions of galaxy properties (described in the 
text) for our volume— limited sample. All distributions have been 
re-normalized to have the same total number. 



(|Shimasaku_el_al J I2001D and e - class IjStoughton at al.l 
l20n2i .Connolly fc Szalavl ITooall for our AGN sample 
appear to be preferentially missing the bluest (in e- 
class), disk -dominated galaxies (see also Figure 12 in 
IMiller et al. 2003) . The most striking difference is for 
the D4000 index, a measure of the 4000A break, and 
the u — r color, where the AGN distributions do not ex- 
hibit the bi- modal shape seen for all galaxies. 

One possible explanation for the difference between the 
AGN and all galaxy distributions is our exclusion of the 
broad-line QSOs. To investigate this, we have used the 
94 broad-line SDSS QSOs within 0.06 < z < 0.085 and 
brighter than M^ = -19.8. These QSO host galaxies have 
a wide variety of morphologies, while 20% lack an obvi- 
ous host galaxy {i.e., they appear point-like in the SDSS), 
and so any k-corrections on this population could be in- 
accurate. However, the observed-frame colors of all of 
the QSOs are on average bluer than our AGN sample, as 
expected if the QSO component dominates over the light 
from the host galaxy. Even so, the total number of these 
broad-line QSOs is simply too small to accommodate the 



missing blue host galaxy populations discussed above. 

Another possibility comes from our AGN selection cri- 
teria. In particular, our signal-to~noise limit on the 
SDSS spectra could preferentially exclude the lowest lu- 
minosity AGN, as they could be either buried in strongly 
star-forming bulges or accreting at a very low rate in the 
reddest, oldest galaxies. Dust obscuration could also sig- 
nificantly affect our detection of AGN, especially f or the 
stron gly star-forming (bluest) galaxies (Hopkins et alJ 
12003(1 . For instance, '^ 30% of our galaxies have emission- 
lines but could not be classified as ei ther star-forming or 
AGN, and could be obscured AGN. IMiller et a,lJ Ipol 
attempted to statistically model this population using 
colors and noted that there was a significant red popu- 
lation, which were most likely AGN. Using this model, 
we have classified these unidentified emission-line galax- 
ies (ELUs) as AGN. This has the effect of adding mainly 
red (but some blue) galaxies to the histograms in Figure 
01 although they are still dominated by galaxies interme- 
diate between blue and red (low and high D4000). We 
recalculate the 2PCF including these model-dependent 
AGN classifications, and find no statistical difference 
from the AGN 2PCF that ignores the ELUs. We have 
not attempted to subtract off the stellar compo nents of 
the SDSS spectra (see e.o.. lHao fc Strausal2004|) . and so 
we cannot rigorously address those galaxies which have 
both star - forming and AGN components. However, as 
noted in IMiller et alJ l|2003.) . the fraction of late-type 
(e.g., spiral) morphologically classified galaxies harbor- 
ing an AGN is ^ 20%, which is similar to that found by 
iHo ( 2004) who does sub t ract o ff ste llar template s . Th e 
consistency between |h3 lJ2004|) and IMiller et all l|2003l) 
suggests that there are not large numbers of AGN in 
strongly star- forming galaxies that we fail to detect. 

In summary the exclusion of the QSOs does not ex- 
plain why our AGN sample is lacking the bi-modal color 
distribution of the whole galaxy sample. Likewise, while 
we are certainly missing some AGN in the unidentified 
emission-line galaxy population, our measured 2PCF is 
not altered after we attempt to include them. These is- 
sues could be addressed in a more detailed way through 
a multi-wavelength study of these unidentifi ed emission- 
line objects. Therefore, as suggested in IMiller et alJ 
(J2P03), our AGN sample appears to be an unbiased 
tracer, with respect to mass of the whole galaxy popula- 
tion, for the large-scale structure in the local Universe. 

Given that the typical AGN host properties are not a 
random sub-sample of all galaxies, it is somewhat surpris- 
ing that they should cluster the same way as all galax- 
ies. For example, it has been shown that the 2PCF is a 
stro ng function of both th e color and luminosity of galax- 
ies ("Davis fc G elled 1197 6: Hamilton 1988; "Zehav i et aP 
12002) . indicating that the bluest, youngest galaxies pref- 
erentially populate the lowest density regions in the Uni- 
verse, while the reddest, oldest galaxies preferentially live 
in the densest regions (e.q.,[O cmlcr 1974; Dressier 198(J 
iGomez ct aT 2003: Balo gh et al.1 .2004). Removing these 
two tails of the distribution could result in an intermedi- 
ate population that clusters the same way as the whole 
sample. Assuming that the existence of an AGN is inde- 
pendent of environment (see Miller ct al. 2003), one can 
conclude that the mean mass of the AGN dark matter 
halos must be the same as the mean for all galaxies (see 
next Section). 
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4.2. The Mass Scale Selected by the AGN 

We have used the model of iDi Matteo et all l)2003bD 
to make a prediction for the AGN 2PCF. In 
this model, cosmoloKJcal hydr odynamical simulations 
IjSpringel fc Hernauisti l2003albD were used to link the 
growth and activity of central black holes in galaxies to 
the formation of spheroids in galaxy halos. In the pre- 
scription used for black hole growth in the simulations, 
it is assumed that the black hole fueling rate is regulated 
by star formation in the gas. This simple assumption 
was shown to explain both the observed M — a relation 
(JFerrarese & Merritt 2000; Gebh ardt et alJ200(H) and the 
broad properties of the AGN luminosity function (for an 
assumed quasar lifetime). We use this model to con- 
struct a mock AGN catalog and deduce that the mini- 
mum dark matter halo mass (Mmin) in the simulations 
that best matches the observed space density of SDSS 
AGN (n ~ 1.5 X lO'^ Mpc'^) is M^in > 2 x IQi^ Mq. 
This is representative of low redshift L* galaxies, which 
provide the bulk of our AGN sample. Not surprisingly, 
the 2PCF for simulated dark matter halos with masses 
greater than Mmin (shown in Figure ^1 agrees well with 
the observed 2PCF (also based on ~ L* and brighter 
galaxies). As mentioned in the last section, we expect 
the AGN clustering amplitude to match that of the en- 
tire galaxy population when the mean masses of the two 
populations are similar. 

By relating the dark matter halo to the black hole mass 
(according to Eqn. 8 of Di Matteo et al. 2003b; see also 
the observed correlation by Ferrarese 2001 and Baes et al. 
2003), we deduce a mean black hole mass of our sample 
of AGNof MsH -^10^ Mq. 

4.3. Clustering and AGN Activity 

In Figure |31 we show that the 2PCF for the lowest 
L[oiii] AGN in our sample appears to have a higher clus- 
tering amplitude than the highest L[o///] AGN. We test 
whether this amplitude difference is a result of the differ- 
ing AGN host galaxy distributions. From the full AGN 
sample we randomly construct two sub-samples that pos- 
sess the same D4000 distributions as shown in Figure 01 
for the low and high i[o///] samples, but with no regard 
to the i[o///]- We find no difference in their respective 
2PCFs. We repeat the test for u — r color, e-class and 
concentration index and again find no difference. These 
tests demonstrate that the difference seen in the cluster- 
ing strengths between the low and high L^qiii] samples 



is driven by the [OIII] emission line and not by the un- 
derlying galaxy properties in Figure 0] As an additional 
test, we split our AGN sample into highest and lowest 
thirds of their D4000s, u — r color, e-class, and concen- 
tration indices, regardless of L^qiii]- In rnost cases, we 
do see differences in the 2PCF of these subsamples e.g. 
the high D4000 sample is more strongly clustered than 
the low D4000 AGN sample. Likewise, the redder AGN 
are more clustered than the bluer AGN. However, the 
difference in clustering amplitude is strongest when the 
AGN are split by L^qiii]- 

In hierarchical models of structure formation, more 
massive dark matter halos are more strongly clustered. 
Therefore, the fact that the low L^qiii] AGN sub-sample 
has a higher clustering amplitude indicates that the host 
dark matter haloes of these AGN must be preferentially 
more massive than the high Ltoiii] AGN. Furthermore, 
L[oiii\ delineates the high and low mass halos better 
than do other host galaxy properties (like D4000 or 
color). If, as expected, the mass of the black hole corre- 
lates with the halo mass, then the weaker ^[o///] AGN 
must have larger black holes; therefore, a low L\^oiii] 
can only be caused by a low fueling rate. These obser- 
vations are in accordance with studies of nearby mas- 
sive ellipticals, which are known to host the largest black 
holes (consistent with the M — a relation: lGebha rdt et al} 
2000; Ferrarese fc Merritdl200(lD . b^it which typ ically dis- 
play the weakest AGN fflo. Filippenko. fc Sargentjil997t 
Di Matteo et al. 1999, 2003^^ Conversely, the high 
LiQiii] AGN have a lower clustering amplitude consis- 
tent with them occupying lower mass dark matter halos 
(hence having smaller central black holes) but accreting 
at a high rate. 
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